Skip to content

feat(sources/s3): migrate to AWS SDK v2 #4069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 8, 2025

Conversation

Juneezee
Copy link
Contributor

Description:

Closes #4054.

This PR migrates the S3 source to use AWS SDK v2 by following the migration guide https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/migrate-gosdk.html. This brings compatibility improvements and better long-term support from AWS.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

To verify the changes manually, I:

  1. Created a test S3 bucket.
  2. Uploaded sample files containing dummy secrets.
  3. Set up an IAM role for access.
  4. Ran trufflehog s3 and confirmed it behaves as expected post-migration.
trufflehog-aws-sdk-v2-local-test.mp4

Closes #4054.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
@Juneezee Juneezee marked this pull request as ready for review April 18, 2025 16:21
@Juneezee Juneezee requested review from a team as code owners April 18, 2025 16:21
Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this! It's been on our list for a bit. Unfortunately, this PR is causing some new integration test failures (in TestSource_Validate). It looks like there may be some gaps in how it accommodates all the different combinations of buckets and roles that the source allows.

If you're curious, here are the new failures (generated after I modified the test code to actually print them):

=== RUN   TestSource_Validate/roles_without_buckets,_all_can_access_at_least_one_account_bucket
2025-04-18T11:29:03-07:00	info-0	context	Creating checkpointer	{"timeout": 15}
    s3_integration_test.go:246: 
        	Error Trace:	/Users/cody.rose/src/trufflehog/pkg/sources/s3/s3_integration_test.go:246
        	Error:      	Not equal: 
        	            	expected: 0
        	            	actual  : 1
        	Test:       	TestSource_Validate/roles_without_buckets,_all_can_access_at_least_one_account_bucket
    s3_integration_test.go:248: role "arn:aws:iam::619888638459:role/s3-test-assume-role" could not list any s3 buckets for scanning: operation error S3: ListBuckets, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: e554f8c8-a8bb-4aa0-beb5-f83f9b71ba18, api error InvalidClientTokenId: The security token included in the request is invalid.
    --- FAIL: TestSource_Validate/roles_without_buckets,_all_can_access_at_least_one_account_bucket (0.62s)
=== RUN   TestSource_Validate/role_and_buckets,_can_access_at_least_one_bucket
2025-04-18T11:29:04-07:00	info-0	context	Creating checkpointer	{"timeout": 15}
    s3_integration_test.go:246: 
        	Error Trace:	/Users/cody.rose/src/trufflehog/pkg/sources/s3/s3_integration_test.go:246
        	Error:      	Not equal: 
        	            	expected: 0
        	            	actual  : 3
        	Test:       	TestSource_Validate/role_and_buckets,_can_access_at_least_one_bucket
    s3_integration_test.go:248: could not get regional client for bucket "truffletestbucket-no-access": could not get s3 region for bucket: truffletestbucket-no-access: operation error S3: HeadBucket, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 4aff8536-ba55-4e52-b395-81a0df48042f, api error InvalidClientTokenId: The security token included in the request is invalid.
    s3_integration_test.go:248: could not get regional client for bucket "truffletestbucket-s3-role-assumption": could not get s3 region for bucket: truffletestbucket-s3-role-assumption: operation error S3: HeadBucket, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 59dea661-dee0-4327-a3a2-a4e71f0010b2, api error InvalidClientTokenId: The security token included in the request is invalid.
    s3_integration_test.go:248: role "arn:aws:iam::619888638459:role/s3-test-assume-role" could not list objects in any bucket
    --- FAIL: TestSource_Validate/role_and_buckets,_can_access_at_least_one_bucket (0.47s)
2025-04-18T11:29:04-07:00	info-0	context	Creating checkpointer	{"timeout": 15}
    s3_integration_test.go:246: 
        	Error Trace:	/Users/cody.rose/src/trufflehog/pkg/sources/s3/s3_integration_test.go:246
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 6
        	Test:       	TestSource_Validate/roles_and_buckets,_one_error_per_role_that_cannot_access_at_least_one_bucket
    s3_integration_test.go:248: could not get regional client for bucket "truffletestbucket-no-access": could not get s3 region for bucket: truffletestbucket-no-access: operation error S3: HeadBucket, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: fc4aad7f-7ee5-481d-8e38-0e255888b915, api error InvalidClientTokenId: The security token included in the request is invalid.
    s3_integration_test.go:248: could not get regional client for bucket "truffletestbucket-s3-role-assumption": could not get s3 region for bucket: truffletestbucket-s3-role-assumption: operation error S3: HeadBucket, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: cbed8d71-4889-4fa5-b924-7776254c7c20, api error InvalidClientTokenId: The security token included in the request is invalid.
    s3_integration_test.go:248: role "arn:aws:iam::619888638459:role/s3-test-assume-role" could not list objects in any bucket
    s3_integration_test.go:248: could not get regional client for bucket "truffletestbucket-no-access": could not get s3 region for bucket: truffletestbucket-no-access: operation error S3: HeadBucket, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 41a29cae-3245-471d-8666-b17010cc5412, api error InvalidClientTokenId: The security token included in the request is invalid.
    s3_integration_test.go:248: could not get regional client for bucket "truffletestbucket-s3-role-assumption": could not get s3 region for bucket: truffletestbucket-s3-role-assumption: operation error S3: HeadBucket, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 2deb96fb-a2ce-4878-9758-40e661c94080, api error InvalidClientTokenId: The security token included in the request is invalid.
    s3_integration_test.go:248: role "arn:aws:iam::619888638459:role/test-no-access" could not list objects in any bucket
    --- FAIL: TestSource_Validate/roles_and_buckets,_one_error_per_role_that_cannot_access_at_least_one_bucket (0.97s)
2025-04-18T11:29:05-07:00	info-0	context	Creating checkpointer	{"timeout": 15}
    s3_integration_test.go:246: 
        	Error Trace:	/Users/cody.rose/src/trufflehog/pkg/sources/s3/s3_integration_test.go:246
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 3
        	Test:       	TestSource_Validate/role_and_buckets,_a_bucket_doesn't_even_exist
    s3_integration_test.go:248: could not get regional client for bucket "not-a-real-bucket-asljdhmglasjgvklhsdaljfh": could not get s3 region for bucket: not-a-real-bucket-asljdhmglasjgvklhsdaljfh: operation error S3: HeadBucket, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 4f7d2f8b-22a7-4f94-89dc-0d7a72f1301a, api error InvalidClientTokenId: The security token included in the request is invalid.
    s3_integration_test.go:248: could not get regional client for bucket "truffletestbucket-s3-role-assumption": could not get s3 region for bucket: truffletestbucket-s3-role-assumption: operation error S3: HeadBucket, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 8dc601e7-e091-4bd8-b3f0-bdf0900266bd, api error InvalidClientTokenId: The security token included in the request is invalid.
    s3_integration_test.go:248: role "arn:aws:iam::619888638459:role/s3-test-assume-role" could not list objects in any bucket
    --- FAIL: TestSource_Validate/role_and_buckets,_a_bucket_doesn't_even_exist (0.48s)

Reference: #4069 (comment)
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
@Juneezee
Copy link
Contributor Author

Juneezee commented Apr 19, 2025

@rosecodym Thank you so much for reviewing this PR! I really appreciate it.

Regarding the integration test failures, I believe they are caused by #4069 (comment), which I have addressed in commit a5411bf. When you have a chance, could you please re-run the integration tests to see if this resolves the issue?

UPDATE: Another fix applied in commit 2191e18, see #4069 (comment).

Note

Just to clarify, the following code handles the scenario tested in TestSource_Validate:

	if roleArn != "" {
		// A valid credentials is required to assume IAM role. aws.AnonymousCredentials is not a valid credentials.
		// If the value of credsProvider is aws.AnonymousCredentials{} from the above switch-case,
		// we will need to set credsProvider to nil to use SDK's default credential chain.
		_, isUnauthenticated := credsProvider.(aws.AnonymousCredentials)
		if isUnauthenticated {
			credsProvider = nil
		}

The TestSource_Validate test provides both AccessKey and Roles, which is equivalent to running the trufflehog s3 command with all three flags: --key, --secret, and --role-arn.

conn, err := anypb.New(&sourcespb.S3{
Credential: &sourcespb.S3_AccessKey{
AccessKey: &credentialspb.KeySecret{
Key: s3key,
Secret: s3secret,
},
},
Buckets: tt.buckets,
IgnoreBuckets: tt.ignoreBuckets,
Roles: tt.roles,
})

P.S. You might have already noticed, but s3_integration_test.go currently has two TestSourceChunksNoResumption test case declarations. I’ve opened a separate PR to clean that up: #4048.

@Juneezee
Copy link
Contributor Author

Juneezee commented Apr 19, 2025

I performed a deeper debugging session and discovered an undocumented behavioral difference between V1’s GetBucketRegionWithClient and V2's GetBucketRegion.

Specifically, V1’s GetBucketRegionWithClient does not return an error when credentials are invalid, whereas V2’s GetBucketRegion fails under the same conditions.

In the following code examples, I used a fake access key, fake secret key, and a fake role ARN.

GetBucketRegionWithClient in V1

package main

import (
	"context"
	"log"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/credentials"
	"github.com/aws/aws-sdk-go/aws/credentials/stscreds"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3"
	"github.com/aws/aws-sdk-go/service/s3/s3manager"
	"github.com/aws/aws-sdk-go/service/sts"
)

func main() {
	ctx := context.Background()

	cfg := &aws.Config{
		Region:      aws.String("us-east-1"),
		Credentials: credentials.NewStaticCredentials("fakeDummy", "fakeDummy", ""),
	}
	sess, err := session.NewSession(cfg)
	if err != nil {
		log.Fatal(err)
	}

	stsClient := sts.New(sess)
	cfg.Credentials = stscreds.NewCredentialsWithClient(stsClient, "000000000:role/service-role/fake-non-existent-role")

	s3Client := s3.New(sess)

	region, err := s3manager.GetBucketRegionWithClient(ctx, s3Client, "truffletestbucket-s3-role-assumption")
	log.Println("Region: ", region)
	log.Println("Err: ", err)
}

Running the code above produces the following output:

2025/04/19 19:04:01 Region:  us-east-2
2025/04/19 19:04:01 Err:  <nil>

Upon inspecting the source code of V1's GetBucketRegionWithClient, we can see that it bypasses the configured credentials and issues the HeadBucket request using credentials.AnonymousCredentials:

func GetBucketRegionWithClient(ctx aws.Context, svc s3iface.S3API, bucket string, opts ...request.Option) (string, error) {
req, _ := svc.HeadBucketRequest(&s3.HeadBucketInput{
Bucket: aws.String(bucket),
})
req.Config.S3ForcePathStyle = aws.Bool(true)

req.Config.Credentials = credentials.AnonymousCredentials

GetBucketRegion in V2

package main

import (
	"context"
	"log"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/credentials"
	"github.com/aws/aws-sdk-go-v2/credentials/stscreds"
	s3manager "github.com/aws/aws-sdk-go-v2/feature/s3/manager"
	"github.com/aws/aws-sdk-go-v2/service/s3"
	"github.com/aws/aws-sdk-go-v2/service/sts"
)

func main() {
	ctx := context.Background()

	cfg, err := config.LoadDefaultConfig(ctx,
		config.WithRegion("us-east-1"),
		config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider("fakeDummy", "fakeDummy", "")),
	)
	if err != nil {
		log.Fatal(err)
	}

	stsClient := sts.NewFromConfig(cfg)
	provider := stscreds.NewAssumeRoleProvider(stsClient, "arn:aws:iam::000000000:role/service-role/fake-non-existent-role")
	cfg.Credentials = aws.NewCredentialsCache(provider)

	s3Client := s3.NewFromConfig(cfg)

	region, err := s3manager.GetBucketRegion(ctx, s3Client, "truffletestbucket-s3-role-assumption")
	log.Println("Region: ", region)
	log.Println("Err: ", err)
}

Running the above code results in the following output:

2025/04/19 19:03:17 Region:
2025/04/19 19:03:17 Err:  operation error S3: HeadBucket, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 331c40dd-d8c4-42ca-9105-fcff18f9a8fb, api error InvalidClientTokenId: The security token included in the request is invalid.

The error originates from the following call chain:

  1. https://github.com/aws/aws-sdk-go-v2/blob/service/s3/v1.79.2/service/s3/auth.go#L274-L283
  2. https://github.com/aws/aws-sdk-go-v2/blob/v1.36.3/internal/auth/smithy/credentials_adapter.go#L40-L43

Solution

It turns out that valid credentials are not required to retrieve a bucket's region using the HeadBucket API.

A simple HEAD request returns the x-amz-bucket-region header, even when the response status is 403 Forbidden:

curl --head truffletestbucket-s3-role-assumption.s3.amazonaws.com

HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-2
x-amz-request-id: 7E2S67HYJWFER9XD
x-amz-id-2: ZRbFyUcIJDb7y15lJvnG9M2vGqF55h7NQLaYJvQFh2CQTsMsCZDpwHa1VIQGHIHd52sOwaJwJ3vYDkxULXKp+tlDH1l5lD2Tr5dbpV+CS/E=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Sat, 19 Apr 2025 11:47:05 GMT
Server: AmazonS3

To make V2’s GetBucketRegion return the region even when credentials are invalid, you can override the options to explicitly set Credentials to nil:

-	region, err := s3manager.GetBucketRegion(ctx, s3Client, "truffletestbucket-s3-role-assumption")
+	region, err := s3manager.GetBucketRegion(ctx, s3Client, "truffletestbucket-s3-role-assumption", func(options *s3.Options) {
+		options.Credentials = nil
+	})

Here is the complete updated code:

package main

import (
	"context"
	"log"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/credentials"
	"github.com/aws/aws-sdk-go-v2/credentials/stscreds"
	s3manager "github.com/aws/aws-sdk-go-v2/feature/s3/manager"
	"github.com/aws/aws-sdk-go-v2/service/s3"
	"github.com/aws/aws-sdk-go-v2/service/sts"
)

func main() {
	ctx := context.Background()

	cfg, err := config.LoadDefaultConfig(ctx,
		config.WithRegion("us-east-1"),
		config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider("fakeDummy", "fakeDummy", "")),
	)
	if err != nil {
		log.Fatal(err)
	}

	stsClient := sts.NewFromConfig(cfg)
	provider := stscreds.NewAssumeRoleProvider(stsClient, "arn:aws:iam::000000000:role/service-role/fake-non-existent-role")
	cfg.Credentials = aws.NewCredentialsCache(provider)

	s3Client := s3.NewFromConfig(cfg)

	region, err := s3manager.GetBucketRegion(ctx, s3Client, "truffletestbucket-s3-role-assumption", func(options *s3.Options) {
		options.Credentials = nil
	})
	log.Println("Region: ", region)
	log.Println("Err: ", err)
}

Running the above code results in the following output:

2025/04/19 19:59:06 Region:  us-east-2
2025/04/19 19:59:06 Err:  <nil>

Reference: #4069 (comment)
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
@Juneezee Juneezee requested a review from rosecodym April 19, 2025 12:15
@rosecodym
Copy link
Collaborator

Thank you for the updates and detailed investigation! It looks like your change has fixed the issue with the validation integration test, but I'm still seeing some timeouts when running the tests on your branch. I'm investigating to see whether your PR actually changed anything or this is some ephemera related to how we're testing S3 (which, as you've discovered, isn't perfect).

@Juneezee
Copy link
Contributor Author

I'm still seeing some timeouts when running the tests on your branch

@rosecodym Would you mind sharing which tests are timing out and the corresponding error logs?

@rosecodym
Copy link
Collaborator

Some of them were the tests in s3_integration_test.go, but they're not consistent and not any different from the runs before your change, so you don't need to address them. I think we just need to increase some timeouts on those tests, but you can let us handle that.

However, go test ./pkg/sources/s3 -run "^TestSource_Chunks$" no longer terminates after your change. (There's no error or anything - it just hangs.) I'm not sure where exactly it's hanging, because attaching a debugger resolves the issue and I haven't had time to do printf debugging.

@Juneezee
Copy link
Contributor Author

However, go test ./pkg/sources/s3 -run "^TestSource_Chunks$" no longer terminates after your change. (There's no error or anything - it just hangs.)

@rosecodym After another debugging session, I believe this issue in the TestSource_Chunks test (in s3_test.go) is not caused by the changes introduced in this PR. I tested it on the main branch using my own S3 bucket and was able to reproduce the same hanging behavior.

I have submitted the fix in a separate PR: #4048. That one can be merged independently before this PR.

Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your detailed investigation and work! I have one remaining outstanding question about the behavior before and after this change for a certain combination of options - please see my inline comment.

Comment on lines 160 to 162
// A valid credentials is required to assume IAM role. aws.AnonymousCredentials is not a valid credentials.
// If the value of credsProvider is aws.AnonymousCredentials{} from the above switch-case,
// we will need to set credsProvider to nil to use SDK's default credential chain.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the previous behavior if a user explicitly requested an unauthenticated connection and also attempted to assume a role? My reading of the comment preceding this one (in the default switch case) is that such a combination of options should fail, because the default credentials play should only be used if the user fails to configure any authentication method at all. If I'm correct we should preserve that behavior.

Copy link
Contributor Author

@Juneezee Juneezee Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the previous behavior if a user explicitly requested an unauthenticated connection and also attempted to assume a role

We will see an authentication error message with --log-level=5. See the first screen recording role-without-auth-v1.mp4 below.

Note

From aws/aws-sdk-go-v2#2111 (comment):

Your application needs to get valid credentials first, and then call the STS client to assume the role programmatically, only then you can make API calls with the temporary creds that assumeRole returns.

  1. AWS SDK V1

    role-without-auth-v1.mp4
  2. AWS SDK V2, before commit 8cce87c

    role-without-auth-v2-continue.mp4
  3. AWS SDK V2, after commit 8cce87c

    role-without-auth-v2-break.mp4

Commit 8cce87c fixes an infinite loop issue in the s3.NewListObjectsV2Paginator pagination. The HasMorePages does not return false when an error occurs. We have to break out of the loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the extra info! But my question was actually about the sourcespb.S3_Unauthenticated case in the switch statement above, which is not represented by those screen recordings. In retrospect, I can understand the confusion - the "unauthenticated" case is only available in TruffleHog Enterprise! However, we should still preserve its existing behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But my question was actually about the sourcespb.S3_Unauthenticated case in the switch statement above, which is not represented by those screen recordings.

Actually the 3 screen recordings above were indeed the case of sourcespb.S3_Unauthenticated. According to ScanS3 function in pkg/engine/s3.go:

connection := &sourcespb.S3{
Credential: &sourcespb.S3_Unauthenticated{},
}
if c.CloudCred {
if len(c.Key) > 0 || len(c.Secret) > 0 || len(c.SessionToken) > 0 {
return sources.JobProgressRef{}, fmt.Errorf("cannot use cloud environment and static credentials together")
}
connection.Credential = &sourcespb.S3_CloudEnvironment{}
}
if len(c.Key) > 0 && len(c.Secret) > 0 {
if len(c.SessionToken) > 0 {
connection.Credential = &sourcespb.S3_SessionToken{
SessionToken: &credentialspb.AWSSessionTokenSecret{
Key: c.Key,
Secret: c.Secret,
SessionToken: c.SessionToken,
},
}
} else {
connection.Credential = &sourcespb.S3_AccessKey{
AccessKey: &credentialspb.KeySecret{
Key: c.Key,
Secret: c.Secret,
},
}
}
}

The Credential is sourcespb.S3_Unauthenticated when we do not pass any of the following flags to trufflehog s3 command:

  • --key, --secret
  • --key, --secret, --session-token
  • --cloud-environment

Providing --role-arn flag does not affect the Credential field.

In retrospect, I can understand the confusion - the "unauthenticated" case is only available in TruffleHog Enterprise! However, we should still preserve its existing behavior.

Got it. I have reverted the change in commit 49a8209. The sourcespb.S3_Unauthenticated case in V2 now matches existing V1 behavior.

unauthenticated-v1.mp4
unauthenticated-v2.mp4

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
@Juneezee
Copy link
Contributor Author

Juneezee commented May 5, 2025

Update on #4069 (comment).

The AWS team has confirmed that this is a bug (aws/aws-sdk-go-v2#3077), and the fix will be included in the next SDK release.

Comment on lines 160 to 162
// A valid credentials is required to assume IAM role. aws.AnonymousCredentials is not a valid credentials.
// If the value of credsProvider is aws.AnonymousCredentials{} from the above switch-case,
// we will need to set credsProvider to nil to use SDK's default credential chain.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the extra info! But my question was actually about the sourcespb.S3_Unauthenticated case in the switch statement above, which is not represented by those screen recordings. In retrospect, I can understand the confusion - the "unauthenticated" case is only available in TruffleHog Enterprise! However, we should still preserve its existing behavior.

// account, but the role probably doesn't have access to all of them). This makes it expected behavior
// and therefore not an error.
ctx.Logger().V(3).Info("could not list objects in bucket", "err", err)
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell me if I understand this right: When using the V1 SDK, auth errors would occur when calling ListObjectsV2PagesWithContext, meaning that there would be at most one (per bucket). But after this change, they occur with each individual call to paginator.NextPage, meaning that we will get one per page, so we have to explicitly break out of the pagination loop. This means that if paginator.NextPage returns some other error, we'll also break out of the loop, whereas before we'd just try the next page.

This seems like a functional change so I want to ensure I understand it. (It doesn't sound avoidable.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your understanding is correct.

This seems like a functional change so I want to ensure I understand it. (It doesn't sound avoidable.)

Yes, it is unavoidable. In V1, pagination is handled using a callback passed to ListObjectsV2PagesWithContext. If an error occurs, ListObjectsV2PagesWithContext returns err immediately.

pageNumber := 1
err = regionalClient.ListObjectsV2PagesWithContext(
ctx,
input,
func(page *s3.ListObjectsV2Output, _ bool) bool {
pageMetadata := pageMetadata{
bucket: bucket,
pageNumber: pageNumber,
client: regionalClient,
page: page,
}
processingState := processingState{
errorCount: &errorCount,
objectCount: &objectCount,
}
s.pageChunker(ctx, pageMetadata, processingState, chunksChan)
pageNumber++
return true
})

In V2, pagination is handled using HasMorePages() and NextPage(). As mentioned in #4069 (comment), HasMorePages() does not return false when the first page returns an error. I use break instead of return here to allow scanning to continue with the next bucket, preserving the behavior from V1.

for bucketIdx := pos.index; bucketIdx < bucketsToScanCount; bucketIdx++ {
	...

	paginator := s3.NewListObjectsV2Paginator(regionalClient, input)
	for paginator.HasMorePages() {
		output, err := paginator.NextPage(ctx)
		if err != nil {
			break
		}
		...
	}
}

Reference: #4069 (comment)
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
@Juneezee Juneezee requested a review from rosecodym May 6, 2025 14:49
@rosecodym rosecodym requested a review from abmussani May 7, 2025 20:07
@rosecodym
Copy link
Collaborator

Update on #4069 (comment).

The AWS team has confirmed that this is a bug (aws/aws-sdk-go-v2#3077), and the fix will be included in the next SDK release.

Does that mean that we need to wait for them to release the next version before merging this?

@Juneezee
Copy link
Contributor Author

Juneezee commented May 7, 2025

Update on #4069 (comment).
The AWS team has confirmed that this is a bug (aws/aws-sdk-go-v2#3077), and the fix will be included in the next SDK release.

Does that mean that we need to wait for them to release the next version before merging this?

@rosecodym The fix has already been released in github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.17.75. I have updated go.mod in commit f375a01 😊

@rosecodym
Copy link
Collaborator

Update on #4069 (comment).
The AWS team has confirmed that this is a bug (aws/aws-sdk-go-v2#3077), and the fix will be included in the next SDK release.

Does that mean that we need to wait for them to release the next version before merging this?

@rosecodym The fix has already been released in github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.17.75. I have updated go.mod in commit f375a01 😊

oh whoops!

rosecodym
rosecodym previously approved these changes May 8, 2025
Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your detailed investigation and work and for accommodating all the back and forth.

@rosecodym rosecodym dismissed their stale review May 8, 2025 14:30

i'm seeing a new integration test failure; i'll provide details once i have time

Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false alarm! the test failure was some transient thing!

@rosecodym rosecodym merged commit 5f56550 into trufflesecurity:main May 8, 2025
13 checks passed
ahrav pushed a commit that referenced this pull request May 8, 2025
Fixes: 5f56550 ("feat(sources/s3): migrate to AWS SDK v2 (#4069)")

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
abmussani added a commit to abmussani/trufflehog that referenced this pull request May 16, 2025
* main:
  [Feat] Added Mux API Analyzer (trufflesecurity#4128)
  fixed name of netlify analyzer in cli output (trufflesecurity#4140)
  fix(discordwebhook): Update Discord webhook detector to support 19-digit IDs (trufflesecurity#4133)
  [Feat] Added New AccuWeather Detector Version (trufflesecurity#4114)
  [Feat] Added Ngrok API Key Analyzer (trufflesecurity#4110)
  Improved JDBC Detector Regex (trufflesecurity#4109)
  [Feat] Detector implementation for Azure Configuration Connection String Key (trufflesecurity#3939)
  test(sources/s3): fix missing region error (trufflesecurity#4131)
  feat(sources/s3): migrate to AWS SDK v2 (trufflesecurity#4069)
  Update PreCommit.md (trufflesecurity#4112)
  Exclusion of FalsePositive GH's usernames in PrivateKeyDetector (trufflesecurity#4046)
  Monday App Analyzer (trufflesecurity#4120)
  [Feat] Detector implementation for Azure API Management Direct Management Key (trufflesecurity#3938)
  Fastly Analyzer (trufflesecurity#4082)
  Postman Code Uses Consistent Casing for Id Var Names (trufflesecurity#4124)
  Normalize UID to Uid in Postman Code (trufflesecurity#4125)
  postman_client.IDNameUUID becomes IdNameUid (trufflesecurity#4123)
  Fixed Kontent Detector (trufflesecurity#4122)

# Conflicts:
#	pkg/analyzer/analyzers/analyzers.go
#	pkg/analyzer/cli.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to AWS SDK v2
2 participants